b'         U.S. Department of Energy\n         Office of Inspector General\n         Office of Audit Services\n\n\n\n\nAudit Report\nManagement Controls over\nSelected Departmental Critical\nMonitoring and Control Systems\n\n\n\n\nOAS-M-05-06                            June 2005\n\x0c\x0c\x0c\x0cREPORT ON MANAGEMENT CONTROLS OVER SELECTED\nDEPARTMENTAL CRITICAL MONITORING AND CONTROL\nSYSTEMS\n\n\nTABLE OF\nCONTENTS\n\n\n   Protecting Critical Monitoring and Control Systems\n\n   Details of Finding ...........................................................................................1\n\n   Recommendations...........................................................................................5\n\n   Comments .......................................................................................................6\n\n\n   Appendices\n\n   1. Objective, Scope, and Methodology.........................................................9\n\n   2. Prior Reports ...........................................................................................11\n\n   3. Management Comments .........................................................................13\n\x0cPROTECTING CRITICAL MONITORING AND CONTROL\nSYSTEMS\n\nEnsuring Continuation   The Department of Energy (Department) could not\nor Restoration of       ensure that it could continue operations or quickly restore\nEssential Operations    selected critical monitoring and control systems in the\n                        event of an emergency. Specifically, management had not\n                        fully assessed risks or taken adequate steps to mitigate the\n                        foreseeable risks confronting the six critical monitoring and\n                        control systems we reviewed.\n\n                                              Risk Assessments\n\n                        Management had not fully assessed the risk and\n                        cost-benefit of risk mitigation strategies for three of the six\n                        systems we reviewed, including Argus which is a system\n                        deployed at a number of sites to control access to facilities\n                        that house critical information and nuclear materials. To\n                        manage risk, the Federal Information Security\n                        Management Act requires agencies to assess, mitigate, and\n                        periodically reevaluate risks and security measures for all\n                        major systems. Risk assessments enable management to\n                        identify threats, vulnerabilities, and the likelihood of\n                        adverse actions or potential consequences. Specifically,\n                        management had not:\n\n                              \xe2\x80\xa2   Conducted a comprehensive risk assessment for\n                                  Argus at Lawrence Livermore National\n                                  Laboratory (Livermore). While program officials\n                                  had addressed the risk of the system and of its\n                                  network not being available, they had not\n                                  identified and mitigated risks posed to the system\n                                  by natural disasters, environmental hazards, or\n                                  human error. For example, the risk of human\n                                  error was increased because an administrator was\n                                  responsible for both reviewing vulnerability scans\n                                  and implementing corrective actions to address\n                                  identified vulnerabilities. To the site\'s credit,\n                                  management took action to correct this problem\n                                  when we brought it to their attention.\n\n                              \xe2\x80\xa2   Fully analyzed the cost and benefit of strategies to\n                                  mitigate identified threats and vulnerabilities\n                                  posed to the Distributed Control System which\n                                  controls the emergency oil flow at the Strategic\n                                  Petroleum Reserve. For example, at the time of\n                                  our site visit, management had not completed its\n\n\n________________________________________________________________\nPage 1                                            Details of Finding\n\x0c                               assessment of the likelihood of exploitation of\n                               identified vulnerabilities nor the impact of such\n                               exploitation on the system and its energy resource\n                               mission. Consequently, management could not\n                               fully assess the probability and consequence of\n                               vulnerabilities being exploited or evaluate the cost\n                               and benefits of eliminating such issues.\n                               Subsequent to our field visit, management\n                               informed us that they had completed a risk\n                               assessment that included risk mitigation action\n                               plans.\n\n                           \xe2\x80\xa2   At the time of our site visits, officials had not\n                               completed a documented risk assessment to\n                               identify and evaluate potential system threats and\n                               vulnerabilities for the Argus system that is used to\n                               control access to the Department Headquarters\n                               (Headquarters).\n\n                                            Risk Mitigation\n\n                      Management did not take necessary actions to mitigate\n                      foreseeable risks associated with critical monitoring and\n                      control systems. Specifically, management had not fully\n                      developed and tested contingency plans for three of the six\n                      systems to ensure that emergency situations would be\n                      effectively managed. Also, it did not ensure it could\n                      recover from an incident by protecting system backup\n                      capabilities from the same risks posed to the primary\n                      systems. Four of the six systems either had their backup\n                      systems and/or backup software co-located with the\n                      primary system or had not provided backup capability to\n                      control critical processes during an emergency situation.\n                      For example:\n\n                           \xe2\x80\xa2   The Western Area Power Administration\'s\n                               (Western) Supervisory Control and Data\n                               Acquisition system (SCADA), which helps\n                               control the flow of electricity to a regional power\n                               grid, had its backup copies of system software and\n                               data co-located with the primary system, in part\n                               because its contract for off-site storage had\n                               lapsed. It also had its backup system co-located\n                               with the primary system. The Livermore Argus\n                               system also had backup copies of system software and\n\n________________________________________________________________\nPage 2                                            Details of Finding\n\x0c                               data co-located with the primary system. In\n                               addition, contingency plans for recovering\n                               mission capability for both the Western and\n                               Livermore systems had not been completed or\n                               tested. Near the end of our audit we learned that\n                               Western had purchased and was configuring a\n                               secondary control system that management\n                               planned to locate off-site. Also, subsequent to our\n                               review, Western indicated that it had arranged to\n                               ship data backups to an off-site storage facility on\n                               a regular basis.\n\n                           \xe2\x80\xa2   The Savannah River Site\'s Distributed Control\n                               System used to control the flow of tritium at its\n                               processing facility did not have a backup system\n                               and did not have a plan that fully addressed\n                               recovery of system capability.\n\n                           \xe2\x80\xa2   The Strategic Petroleum Reserve Distributed\n                               Control System, used to control the flow of oil to\n                               emergency reserves, had its primary and backup\n                               system located in the same room. The system\n                               contingency plan at the Strategic Petroleum\n                               Reserve was limited to procedures for obtaining a\n                               manual replacement system in the event of system\n                               failure, and did not address the susceptibility to\n                               failure caused by common disasters due to\n                               maintaining primary and backup systems in the\n                               same control room. Had the primary and backup\n                               systems been adequately separated, it may have\n                               eliminated the need to adopt manual methods \xe2\x80\x93 a\n                               process described as costly by a site engineer \xe2\x80\x93 in\n                               the event of a localized disaster.\n\n\nRisk Management       Site management had not sufficiently considered and\n                      periodically evaluated the risk that critical monitoring and\n                      control systems we reviewed would become inoperable and\n                      unable to be restored in a timely manner. For example, five\n                      of the six systems reviewed had not been certified and\n                      accredited (C&A) for operation according to National\n                      Institute of Standards and Technology (NIST) guidance at\n                      the time of our site visits (only the Savannah River Site had\n                      certified and accredited the Defense Waste Processing\n                      Facility\'s Distributed Control System). C&A represents\n                      senior management\'s decision to authorize the system for\n\n________________________________________________________________\nPage 3                                            Details of Finding\n\x0c                          operation and requires management to explicitly accept the\n                          risk of operating the system based on an agreed upon set of\n                          security controls. According to NIST, systems cannot be\n                          properly certified and accredited unless management\n                          ensures that it completes a risk assessment, security plan,\n                          contingency plan, and necessary mitigating controls. In\n                          commenting on a draft of our report, management officials\n                          stated that they had begun or completed implementing a\n                          comprehensive risk management process, to include C&A\n                          of systems and other mitigating actions.\n\n                          Furthermore, the Department had made only limited use of\n                          its own internal experts in evaluating the risks pertaining to\n                          critical monitoring and control systems. The Department\n                          established a group of energy infrastructure control system\n                          experts at Sandia National Laboratory that have provided\n                          advice to private sector utilities regarding critical control\n                          systems. These experts told us that they had received very\n                          few internal requests to utilize their expertise. Of the sites\n                          we reviewed, only the Strategic Petroleum Reserve availed\n                          themselves of these experts. Officials at one site we visited\n                          stated that they were not aware that these experts were\n                          available. Had their services been utilized, many of the\n                          weaknesses we identified may have been disclosed and\n                          corrected. After our field work was complete, management\n                          officials told us that they intend to utilize Departmental\n                          expertise in the future where appropriate.\n\n\nCritical Infrastructure   Critical monitoring and control systems were vulnerable\nand Public Safety         to disruptions due to disasters or other emergencies.\n                          Assessing and mitigating risks to these systems may help\n                          prevent extended system shutdowns that could lead to the\n                          wide-scale disruptions to electricity grids, the inability to\n                          maintain controlled access to critical information and\n                          nuclear materials, or the use of costly alternatives to\n                          provide emergency energy supplies in the event of a\n                          national crisis.\n\n\n\n\n________________________________________________________________\nPage 4                                            Details of Finding\n\x0c                     The lack of adequate backup systems and contingency\n                     planning were recently highlighted as part of the cause of\n                     the August 2003 blackout in the northeast portion of the\n                     United States. Key monitoring systems failed, thereby\n                     preventing electricity control operators from detecting a\n                     short circuit in the grid, resulting in a cascading power\n                     failure across the northeastern United States. A joint\n                     United States and Canadian task force investigating the\n                     blackout also noted that it was caused in part by failure to\n                     conduct multiple contingency and extreme condition\n                     assessments and to have backup monitoring tools available\n                     after the primary alarming/monitoring systems failed.\n\n\nRECOMMENDATIONS      To ensure that the Department\'s critical monitoring and\n                     control systems are able to continue operation in the event\n                     of emergencies, we recommend that the Associate\n                     Administrator for Management and Administration,\n                     National Nuclear Security Administration; the Principal\n                     Deputy Assistant Secretary for Fossil Energy; the Director,\n                     Office of Security and Safety Performance Assurance; and\n                     the Administrator, Western Area Power Administration\n                     ensure that critical monitoring and control system owners:\n\n                          1. Implement a comprehensive risk management\n                             process for its critical monitoring and control\n                             systems. This process should include:\n\n                                  a. Periodically assessing and mitigating risk\n                                     to these major systems, including the\n                                     completion of risk assessments,\n                                     contingency plans, and certification and\n                                     accreditation of systems;\n\n                                  b. Ensuring that backup systems and media\n                                     are located a sufficient distance from the\n                                     primary system to facilitate system\n                                     recovery, to include consideration of off-\n                                     site locations; and,\n\n                          2. Take advantage of Department expertise to\n                             periodically evaluate and strengthen management\n                             controls over critical monitoring and control\n                             systems.\n\n\n\n________________________________________________________________\nPage 5                                          Recommendations\n\x0cMANAGEMENT           Management generally concurred with the report\'s overall\nREACTION AND         conclusion and recommendations, but offered clarifying\nAUDITOR COMMENTS     remarks or disagreed with some of our conclusions\n                     regarding specific systems.\n\n                     Proposed and stated actions are generally responsive to our\n                     recommendations. Based on management\'s comments, we\n                     modified our report where appropriate and deleted a\n                     recommendation "to evaluate the need for remote system\n                     operation capability." We have also made a number of\n                     other technical corrections to our report to address\n                     management\'s comments.\n\n                     In reference to specific site comments, management\n                     reaction and the auditor responses follow.\n\n                                       Livermore Argus System\n\n                     Management stated that while risk mitigation is a concern,\n                     they considered Livermore\'s compensatory measures to be\n                     adequate. They also noted they maintain this system\'s\n                     operational software and backup tapes in a separate area\n                     where it is readily available.\n\n                     Management acknowledged that, despite a growing need, a\n                     formal, comprehensive and documented risk assessment\n                     has not occurred. We believe that in the absence of such an\n                     analysis, management can not be assured that all risks have\n                     been fully assessed and properly mitigated for these\n                     systems. We do not agree that having Livermore Argus\n                     backup data in an area separate from the live Argus system\n                     is sufficient, since we found that the separate area is\n                     adjacent to the room housing the live Argus system. Thus,\n                     the backup tapes may be subject to the same localized\n                     disaster as primary Argus system, such as flooding or fire.\n\n                                     Headquarters Argus System\n\n                     Management indicated that based on an assessment of the\n                     impacts of adversary data theft at another site\'s Argus\n                     system, the impacts on the Headquarters system would be\n                     minimal since it operates on a closed Local Area Network.\n                     Officials also stated that they have a "fail-over" scheme to\n                     mitigate loss of functionality at either the Forrestal or the\n                     Germantown facility by utilizing the other site\'s host\n\n________________________________________________________________\nPage 6                                                 Comments\n\x0c                     computer in the event of a system failure. They also noted\n                     that impacts due to human error are mitigated by providing\n                     a limited number of trained individuals and a well trained\n                     protective force to solely perform critical system functions\n                     and restore the Headquarters\' system\'s access control\n                     system should it be rendered inoperative.\n\n                     Nonetheless, a Headquarters official told us that the link\n                     between the two sites was interrupted on more than an\n                     occasional basis. The Headquarters Argus "fail-over"\n                     scheme to mitigate loss of functionality at either the\n                     Forrestal or the Germantown facility does not address the\n                     fact that connectivity between the two sites is adversely\n                     affected when the main data link between the sites is\n                     disrupted.\n\n                                      Western\'s SCADA System\n\n                     Management stated that the system was certified and\n                     accredited under independent review in compliance with\n                     NIST requirements and this information should be updated\n                     in the report\'s Risk Management section.\n\n                     The Office of Inspector General does not agree with\n                     Western\'s assertion that the SCADA system had been\n                     certified and accredited in accordance with NIST\n                     requirements at the time of our review. The certification\n                     documentation we were provided and examined lacked\n                     various elements needed to be consistent with NIST\n                     guidance, such as the existence of contingency procedures\n                     should a failure of the SCADA system occur. Also,\n                     Western management officials told us that the certification\n                     and accreditation satisfied requirements of the North\n                     American Electric Reliability Council, not necessarily\n                     those of NIST.\n\n                                    Strategic Petroleum Reserve\'s\n                                     Distributed Control System\n\n                     Management explained that the controllers, input/output\n                     modules, and the operator stations employ some form of\n                     redundant backup that supports the metric of 95 percent\n                     availability of systems [and] are all located in the same\n                     control room to provide seamless recovery from any\n                     equipment failure. Officials added that the backup for a\n                     total system failure is to manually operate the existing site\n\n________________________________________________________________\nPage 7                                                 Comments\n\x0c                     process equipment. They also noted that in conformance\n                     with design criteria, the functional specification for the\n                     system excluded the remote operability of the process\n                     equipment. They stated that operational and security\n                     concerns outweighed the potential benefits and that\n                     manual operation of the site\'s process equipment as a\n                     backup to total DCS failure is significantly less costly than\n                     providing for remote capability.\n\n                     Management had no documented risk assessment or cost\n                     benefit analysis to support its decision to have the people,\n                     process, and technology related to this system located in the\n                     same control room and to rely on a costly manual process\n                     to recover in the event of a total system failure. We believe\n                     that Strategic Petroleum Reserve officials should have\n                     documented how they arrived at their conclusions and thus\n                     allowed management to make an informed decision\n                     regarding whether to accept the associated risks during the\n                     system accreditation process.\n\n                     Management\'s comments are included in their entirety in\n                     Appendix 3.\n\n\n\n\n________________________________________________________________\nPage 8                                                 Comments\n\x0cAppendix 1\n\nOBJECTIVE             To determine whether selected critical monitoring and\n                      control systems could continue operation in a crisis and/or\n                      had the ability to be restored with minimal disruption and\n                      information loss.\n\n\nSCOPE                 The audit was performed between October 2003 and\n                      March 2005 at Department Headquarters, Washington, DC;\n                      Lawrence Livermore National Laboratory, Livermore, CA;\n                      Western Area Power Administration, Folsom, CA;\n                      Savannah River Site, Aiken, SC; and the Strategic\n                      Petroleum Reserve, New Orleans, LA. Specifically, we\n                      performed a comprehensive review of the agency\'s key\n                      processes for managing critical monitoring and control\n                      systems information technology resources.\n\n\nMETHODOLOGY           To accomplish our audit objective, we:\n\n                           \xe2\x80\xa2   Reviewed a sample of the critical monitoring and\n                               control systems as identified by Department\n                               officials and the Project Matrix Step One Report,\n                               dated August 2003;\n\n                           \xe2\x80\xa2   Reviewed applicable laws, regulations, guidance\n                               and best practices pertaining to managing\n                               information technology resources and initiatives.\n                               We also reviewed relevant reports issued by the\n                               Office of Inspector General and the Government\n                               Accountability Office;\n\n                           \xe2\x80\xa2   Reviewed the Government Performance and\n                               Results Act of 1993 and determined if\n                               performance measures had been established for\n                               managing information technology resources;\n\n                           \xe2\x80\xa2   Reviewed numerous documents related to the\n                               management of critical monitoring and control\n                               systems, including information technology risk\n                               management and contingency planning\n                               documentation; and,\n\n                           \xe2\x80\xa2   Held discussions with program officials and\n                               personnel from the field sites.\n\n\n\n________________________________________________________________\nPage 9                             Objective, Scope, and Methodology\n\x0cAppendix 1 (continued)\n\n                      The audit was conducted in accordance with generally\n                      accepted Government auditing standards for performance\n                      audits and included tests of internal controls and\n                      compliance with laws and regulations to the extent\n                      necessary to satisfy the audit objectives. We assessed\n                      significant internal controls and performance measures in\n                      accordance with the Government Performance and Results\n                      Act of 1993 regarding the management of the Department\'s\n                      critical monitoring and control systems. We did not\n                      identify any performance measures specific to managing\n                      critical monitoring and control systems. However, the\n                      Office of the Chief Information Officer (OCIO) has begun\n                      tracking information on the number of systems that have\n                      been certified and accredited and have developed and tested\n                      contingency plans. Because our review was limited, it\n                      would not necessarily have disclosed all internal control\n                      deficiencies that may have existed at the time of our audit.\n                      We did not rely on computer-processed data to accomplish\n                      our audit objective.\n\n                      An exit conference was held with appropriate management\n                      officials on May 19, 2005.\n\n\n\n\n________________________________________________________________\nPage 10                            Objective, Scope, and Methodology\n\x0cAppendix 2\n\n                                      PRIOR REPORTS\n\n\nOffice of Inspector General Reports\n\n\n       \xe2\x80\xa2 The Department\'s Continuity Planning and Emergency Preparedness\n         (DOE/IG-0657, August 2004). The report found five sites did not develop\n         comprehensive plans to continue essential functions. Specifically, the sites had\n         not fully identified essential functions or alternate facilities in case of emergency.\n         Additionally, the Department did not have specific requirements for sites to\n         validate the effectiveness of corrective actions addressing recognized\n         preparedness weaknesses or to share complex-wide lessons learned about\n         common problems. As a result, the Department may face increased risks to\n         operations, employees, and surrounding communities during an emergency\n         situation.\n\n   \xe2\x80\xa2     Electricity Transmission Scheduling at the Bonneville Power Administration\n         (DOE/IG-637, February 2004). The report outlined the results of an audit\n         conducted to determine whether the Bonneville Power Administration\n         (Bonneville) has a scheduling system in place to meet current and future\n         transmission needs in an automated, deregulated environment. Bonneville\'s\n         system for scheduling transmission transactions did not fully meet its needs in the\n         current operating environment. Bonneville\'s management of the replacement\n         system lacked a comprehensive project plan, and system development and\n         implementation procedures. The effectiveness of the project management effort\n         was hampered by the lack of standardized transmission contracts. Automated\n         scheduling would enhance Bonneville\'s electrical transmission grid by allowing\n         Bonneville to react more quickly to disruptive events, such as a May 2003\n         incident in which Bonneville exceeded the operating capacity of one of its\n         transmission lines.\n\n   \xe2\x80\xa2     Planning for National Nuclear Security Administration Infrastructure\n         (OAS-B-03-02, May 2003). The report outlined the results of an audit conducted\n         to determine whether the National Nuclear Security Administration\'s (NNSA) site\n         plans provided accurate and useful data to aid in the prioritization of mission\n         critical facility renovation and repair projects. The OIG concluded, in part, that\n         NNSA site plans did not identify or prioritize the mission critical facilities and\n         infrastructure in need of repair or refurbishment.\n\n\n\n\n________________________________________________________________\nPage 11                                              Prior Reports\n\x0cAppendix 2 (continued)\n\n\n   \xe2\x80\xa2    Cyber-Related Critical Infrastructure Identification and Protection Measures\n        (DOE/IG-0545, March 2002). The report outlined the results of an audit\n        conducted to determine whether the Department had identified and developed\n        protection measures for its critical cyber and related physical infrastructure\n        assets. While the Department had initiated certain actions designed to enhance\n        cyber security, it had not made sufficient progress in identifying and developing\n        protective measures for critical infrastructures or assets. Even in light of the\n        magnitude of the challenges it faces in this arena, the Department had not\n        devoted sufficient resources to identifying and developing protective measures\n        for cyber-related assets.\n\n\n   Government Accountability Office (GAO) Reports\n\n\n    \xc2\x83   Critical Infrastructure Protection: Challenges and Efforts to Secure Control\n        Systems, GAO 04-354, March 2004). GAO found that along with the increasing\n        cyber threats to control systems, other factors such as standardized technologies\n        with known vulnerabilities and increased connectivity increased the risk to these\n        systems. They note that successful attacks on control systems could have\n        devastating consequences, such as endangering public health and safety.\n        Securing control systems poses significant challenges, including limited\n        specialized security technologies. Without effective coordination of efforts to\n        secure these systems, there is a risk of delaying the development and\n        implementation of more secure systems to manage our critical infrastructures.\n\n   \xe2\x80\xa2    Critical Infrastructure Protection: Challenges for Selected Agencies and\n        Industry Sectors (GAO-03-233, February 2003). GAO issued this report in\n        response to a Congressional request to assess the pace and progress of certain\n        Federal agencies (including the Department of Energy) and private sector\n        Information Sharing and Analysis Centers in achieving certain objectives\n        contributing to the protection of infrastructures critical to the nation. GAO\n        concluded that although the agencies under review had taken some actions to\n        implement critical infrastructure protection policy, they had not completed the\n        fundamental step of identifying their critical infrastructure assets and the\n        operational dependencies of these vital assets on other public and private assets.\n\n\n\n\n________________________________________________________________\nPage 12                                              Prior Reports\n\x0cAppendix 3\n\n\n\n\n________________________________________________________________\nPage 13                                     Management Comments\n\x0cAppendix 3 (continued)\n\n\n\n\n________________________________________________________________\nPage 14                                     Management Comments\n\x0cAppendix 3 (continued)\n\n\n\n\n________________________________________________________________\nPage 15                                     Management Comments\n\x0cAppendix 3 (continued)\n\n\n\n\n________________________________________________________________\nPage 16                                     Management Comments\n\x0c                                                             IG Report No. OAS-M-05-06\n\n                       CUSTOMER RESPONSE FORM\n\nThe Office of Inspector General has a continuing interest in improving the usefulness of\nits products. We wish to make our reports as responsive as possible to our customers\'\nrequirements, and, therefore, ask that you consider sharing your thoughts with us. On the\nback of this form, you may suggest improvements to enhance the effectiveness of future\nreports. Please include answers to the following questions if they are applicable to you:\n\n1. What additional background information about the selection, scheduling, scope, or\n   procedures of the audit would have been helpful to the reader in understanding this\n   report?\n\n2. What additional information related to findings and recommendations could have\n   been included in the report to assist management in implementing corrective actions?\n\n3. What format, stylistic, or organizational changes might have made this report\'s\n   overall message more clear to the reader?\n\n4. What additional actions could the Office of Inspector General have taken on the\n   issues discussed in this report which would have been helpful?\n\n5. Please include your name and telephone number so that we may contact you should\n   we have any questions about your comments.\n\n\nName                                          Date\n\nTelephone                                     Organization\n\n\nWhen you have completed this form, you may telefax it to the Office of Inspector\nGeneral at (202) 586-0948, or you may mail it to:\n\n                           Office of Inspector General (IG-1)\n                                 Department of Energy\n                                Washington, DC 20585\n\n                              ATTN: Customer Relations\n\nIf you wish to discuss this report or your comments with a staff member of the Office of\nInspector General, please contact Wilma Slaughter at (202) 586-1924.\n\x0cThe Office of Inspector General wants to make the distribution of its reports as customer friendly\n                                               and cost\neffective as possible. Therefore, this report will be available electronically through the Internet at\n                                       the following address:\n\n               U.S. Department of Energy Office of Inspector General Home Page\n                                    http://www.ig.doe.gov\n\n  Your comments would be appreciated and can be provided on the Customer Response Form\n\x0c'